Privacy Protection for Natural Language Records: Neural Generative Models for Releasing Synthetic Twitter Data

نویسندگان

  • Alexander G. Ororbia
  • Fridolin Linder
  • Joshua Snoke
چکیده

Redaction has been the most common approach to protecting text data, but synthetic data presents a potentially more reliable alternative for disclosure control. By producing new sample values which closely follow the original sample distribution but do not contain real values, privacy protection can be improved while utility from the data for specific purposes is maintained. We extend the synthetic data approach to natural language by developing a neural generative model for such data. We find that the synthetic models outperform simple redaction on both comparative risk and utility.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Knowledge Transfer for Neural Language Models

In this paper, we propose a generative knowledge transfer technique that trains an RNN based language model (student network) using text and output probabilities generated from a previously trained RNN (teacher network). The text generation can be conducted by either the teacher or the student network. We can also improve the performance by taking the ensemble of soft labels obtained from multi...

متن کامل

Plausible Deniability for Privacy-Preserving Data Synthesis

Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for r...

متن کامل

Differentially Private Releasing via Deep Generative Model (Technical Report)

Privacy-preserving releasing of complex data (e.g., image, text, audio) represents a long-standing challenge for the data mining research community. Due to rich semantics of the data and lack of a priori knowledge about the analysis task, excessive sanitization is often necessary to ensure privacy, leading to significant loss of the data utility. In this paper, we present dp-GAN, a general priv...

متن کامل

k-Same-Net: k-Anonymity with Generative Deep Neural Networks for Face Deidentification

Image and video data are today being shared between government entities and other relevant stakeholders on a regular basis and require careful handling of the personal information contained therein. A popular approach to ensure privacy protection in such data is the use of deidentification techniques, which aim at concealing the identity of individuals in the imagery while still preserving cert...

متن کامل

Analyzing Tools and Algorithms for Privacy Protection and Data Security in Social Networks

The purpose of this research, is to study factors influencing privacy concerns about data security and protection on social network sites and its’ influence on self-disclosure. 100 articles about privacy protection, data security, information disclosure and Information leakage on social networks were studied. Models and algorithms types and their repetition in articles have been distinguished a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016